In this project, I’m going to use Machine Learning to Optimize the Product Backorders.
Backorder is an order which has not been fulfilled yet by company. It indicates the interest of consumer in the product even though the product is in short amount. This is both and good for the company. Good because it shows customer is still interested in the product and demands for it. Bad because if not fulfilled in time the consumer may lose interest, look for alternative product which will result in the loss of company, losing customers and image of the company may be distorted.
Now, what company can do is built so many of the products that there won’t be shortage. But most of the companies can’t do it because of the high inventory cost. And if demand decreases, the will suffer quite a loss.
So, it is better to look at the past data and optimize the current backorder such that the inventory cost is low, product is delivered in time before the conumer loses the interest. This will good for both consumers who get the product they want with only little wait and for company which retains the customers and make profit.
There are a lot of challenges in building the predictive model for optimization of backorders. There are lot of factors which doesn’t depend on the company, product or business but external factors like holiday, season, special occassions etc. So let’s see what we are going to do.
The data we are using here is obtained from Kaggle. You can obtain the data from here.
library(data.table)
library(tidyquant)
library(unbalanced)
library(randomForest)
library(caret)
library(h2o)
train <- read.csv("train.csv", na.strings = "")
test <- read.csv("test.csv", na.strings = "")
Let’s have a look at the data
str(train)
'data.frame': 1687861 obs. of 23 variables:
$ sku : Factor w/ 1687861 levels "(1687860 rows)",..: 2 3 4 5 6 7 8 9 10 11 ...
$ national_inv : int 0 2 2 7 8 13 1095 6 140 4 ...
$ lead_time : int NA 9 NA 8 NA 8 NA 2 NA 8 ...
$ in_transit_qty : int 0 0 0 0 0 0 0 0 0 0 ...
$ forecast_3_month : int 0 0 0 0 0 0 0 0 15 0 ...
$ forecast_6_month : int 0 0 0 0 0 0 0 0 114 0 ...
$ forecast_9_month : int 0 0 0 0 0 0 0 0 152 0 ...
$ sales_1_month : int 0 0 0 0 0 0 0 0 0 0 ...
$ sales_3_month : int 0 0 0 0 0 0 0 0 0 0 ...
$ sales_6_month : int 0 0 0 0 0 0 0 0 0 0 ...
$ sales_9_month : int 0 0 0 0 4 0 0 0 0 0 ...
$ min_bank : int 0 0 0 1 2 0 4 0 0 0 ...
$ potential_issue : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ pieces_past_due : int 0 0 0 0 0 0 0 0 0 0 ...
$ perf_6_month_avg : num -99 0.99 -99 0.1 -99 0.82 -99 0 -99 0.82 ...
$ perf_12_month_avg: num -99 0.99 -99 0.13 -99 0.87 -99 0 -99 0.87 ...
$ local_bo_qty : int 0 0 0 0 0 0 0 0 0 0 ...
$ deck_risk : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 1 2 2 1 1 ...
$ oe_constraint : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ ppap_risk : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 2 1 1 ...
$ stop_auto_buy : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
$ rev_stop : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ went_on_backorder: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
str(test)
'data.frame': 242076 obs. of 23 variables:
$ sku : Factor w/ 242076 levels "(242075 rows)",..: 167 213 440 599 690 1042 1155 1195 1288 1407 ...
$ national_inv : int 62 9 17 9 2 15 0 28 2 2 ...
$ lead_time : int NA NA 8 2 8 2 NA NA NA NA ...
$ in_transit_qty : int 0 0 0 0 0 0 0 0 0 0 ...
$ forecast_3_month : int 0 0 0 0 0 0 0 0 0 0 ...
$ forecast_6_month : int 0 0 0 0 0 0 0 0 0 0 ...
$ forecast_9_month : int 0 0 0 0 0 0 0 0 0 0 ...
$ sales_1_month : int 0 0 0 0 0 0 0 0 0 0 ...
$ sales_3_month : int 0 0 0 0 0 0 0 0 0 0 ...
$ sales_6_month : int 0 0 0 0 0 1 0 0 0 0 ...
$ sales_9_month : int 0 0 0 2 0 2 0 0 0 0 ...
$ min_bank : int 1 1 0 0 0 0 0 0 0 0 ...
$ potential_issue : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ pieces_past_due : int 0 0 0 0 0 0 0 0 0 0 ...
$ perf_6_month_avg : num -99 -99 0.92 0.78 0.54 0.37 -99 -99 -99 -99 ...
$ perf_12_month_avg: num -99 -99 0.95 0.75 0.71 0.68 -99 -99 -99 -99 ...
$ local_bo_qty : int 0 0 0 0 0 0 0 0 0 0 ...
$ deck_risk : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 1 1 2 2 1 ...
$ oe_constraint : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ ppap_risk : Factor w/ 2 levels "No","Yes": 1 2 1 2 1 1 1 1 2 1 ...
$ stop_auto_buy : Factor w/ 2 levels "No","Yes": 2 1 2 2 2 2 2 2 2 2 ...
$ rev_stop : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
$ went_on_backorder: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
Let’s have a look at the target variable
table(train$went_on_backorder)
No Yes
1676567 11293
As we can see the data is highly unbalanced data. Since, we are focused on optimizing the backorder, we need more occurance of backorder in our train data. So, we need to balance the data.
summary(train)
sku national_inv lead_time in_transit_qty forecast_3_month
(1687860 rows): 1 Min. : -27256 Min. : 0.00 Min. : 0.0 Min. : 0.0
1026827 : 1 1st Qu.: 4 1st Qu.: 4.00 1st Qu.: 0.0 1st Qu.: 0.0
1043384 : 1 Median : 15 Median : 8.00 Median : 0.0 Median : 0.0
1043696 : 1 Mean : 496 Mean : 7.87 Mean : 44.1 Mean : 178.1
1043852 : 1 3rd Qu.: 80 3rd Qu.: 9.00 3rd Qu.: 0.0 3rd Qu.: 4.0
1044048 : 1 Max. :12334404 Max. :52.00 Max. :489408.0 Max. :1427612.0
(Other) :1687855 NA's :1 NA's :100894 NA's :1 NA's :1
forecast_6_month forecast_9_month sales_1_month sales_3_month sales_6_month
Min. : 0 Min. : 0 Min. : 0.0 Min. : 0 Min. : 0.0
1st Qu.: 0 1st Qu.: 0 1st Qu.: 0.0 1st Qu.: 0 1st Qu.: 0.0
Median : 0 Median : 0 Median : 0.0 Median : 1 Median : 2.0
Mean : 345 Mean : 506 Mean : 55.9 Mean : 175 Mean : 341.7
3rd Qu.: 12 3rd Qu.: 20 3rd Qu.: 4.0 3rd Qu.: 15 3rd Qu.: 31.0
Max. :2461360 Max. :3777304 Max. :741774.0 Max. :1105478 Max. :2146625.0
NA's :1 NA's :1 NA's :1 NA's :1 NA's :1
sales_9_month min_bank potential_issue pieces_past_due perf_6_month_avg
Min. : 0 Min. : 0.00 No :1686953 Min. : 0.00 Min. :-99.000
1st Qu.: 0 1st Qu.: 0.00 Yes : 907 1st Qu.: 0.00 1st Qu.: 0.630
Median : 4 Median : 0.00 NA's: 1 Median : 0.00 Median : 0.820
Mean : 525 Mean : 52.77 Mean : 2.04 Mean : -6.872
3rd Qu.: 47 3rd Qu.: 3.00 3rd Qu.: 0.00 3rd Qu.: 0.970
Max. :3205172 Max. :313319.00 Max. :146496.00 Max. : 1.000
NA's :1 NA's :1 NA's :1 NA's :1
perf_12_month_avg local_bo_qty deck_risk oe_constraint ppap_risk stop_auto_buy
Min. :-99.000 Min. : 0.000 No :1300377 No :1687615 No :1484026 No : 61086
1st Qu.: 0.660 1st Qu.: 0.000 Yes : 387483 Yes : 245 Yes : 203834 Yes :1626774
Median : 0.810 Median : 0.000 NA's: 1 NA's: 1 NA's: 1 NA's: 1
Mean : -6.438 Mean : 0.626
3rd Qu.: 0.950 3rd Qu.: 0.000
Max. : 1.000 Max. :12530.000
NA's :1 NA's :1
rev_stop went_on_backorder
No :1687129 No :1676567
Yes : 731 Yes : 11293
NA's: 1 NA's: 1
head(train, 10)
tail(train)